Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Feb 14, 2024

XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift had (until this pull) been forcing ClusterVersion spec.upstream empty. This pull is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:

  • Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling.
  • Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com".

The implementation is similar to e438076 (#1954), where I added update-related status properties piped up via CVO -> ClusterVersion .status -> HostedControlPlane .status -> HostedCluster .status. This commit adds a new spec property that is piped down via:

  1. HostedCluster (management cluster) spec.updateService.
  2. HostedControlPlane (management cluster) spec.UpdateService.
  3. Cluster-version operator Deployment --update-service command line option (management cluster), and also ClusterVersion spec.upstream.
  4. Cluster-version operator container (management cluster).

The CVO's new --update-service option is from openshift/cluster-version-operator#1035, and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --update-service is being used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition with a message mentioning the update service URI if there are problems. Although it doesn't look like we capture that condition currently:

$ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --update-service and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (#1954). The recent 1ae3a36 (#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current.

Fixes OTA-1211

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 14, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 14, 2024

@wking: This pull request references OTA-1211 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift had (until this pull) been forcing ClusterVersion spec.upstream empty. This pull is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:

  • Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling.
  • Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com".

The implementation is similar to e438076 (#1954), where I added update-related status properties piped up via CVO -> ClusterVersion .status -> HostedControlPlane .status -> HostedCluster .status. This commit adds a new spec property that is piped down via:

  1. HostedCluster (management cluster) spec.upstream.
  2. HostedControlPlane (management cluster) spec.upstream.
  3. Cluster-version operator Deployment --upstream command line option (management cluster), and also ClusterVersion spec.upstream.
  4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from openshift/cluster-version-operator#1035, and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --upstream is being used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition with a message mentioning the update service URI if there are problems. Although it doesn't look like we capture that condition currently:

$ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (#1954). The recent 1ae3a36 (#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current.

Fixes OTA-1211

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from csrwng and hasueki February 14, 2024 08:28
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Feb 14, 2024
@wking wking force-pushed the upstream-update-service branch 2 times, most recently from 3a93d89 to e204679 Compare February 14, 2024 08:53
@wking wking force-pushed the upstream-update-service branch from e204679 to 872d597 Compare February 14, 2024 10:41
@netlify
Copy link

netlify bot commented Feb 14, 2024

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit de4bcbe
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/660726e59cf2650008ec7826
😎 Deploy Preview https://deploy-preview-3576--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@openshift-ci openshift-ci bot added area/cli Indicates the PR includes changes for CLI area/documentation Indicates the PR includes changes for documentation labels Feb 14, 2024
@wking
Copy link
Member Author

wking commented Feb 14, 2024

openshift/cluster-version-operator#1035 needs to land support for --upstream before the HostedControlPlane controller starts setting it via this pull.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2024
@wking wking force-pushed the upstream-update-service branch from 872d597 to 7d39a44 Compare February 14, 2024 11:32
@openshift-ci openshift-ci bot added the area/ci-tooling Indicates the PR includes changes for CI or tooling label Feb 14, 2024
@shellyyang1989
Copy link

PR pre-merge tested and passed, so

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Mar 29, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 29, 2024

@wking: This pull request references OTA-1211 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift had (until this pull) been forcing ClusterVersion spec.upstream empty. This pull is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:

  • Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling.
  • Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com".

The implementation is similar to e438076 (#1954), where I added update-related status properties piped up via CVO -> ClusterVersion .status -> HostedControlPlane .status -> HostedCluster .status. This commit adds a new spec property that is piped down via:

  1. HostedCluster (management cluster) spec.upstream.
  2. HostedControlPlane (management cluster) spec.upstream.
  3. Cluster-version operator Deployment --upstream command line option (management cluster), and also ClusterVersion spec.upstream.
  4. Cluster-version operator container (managment cluster).

The CVO's new --upstream option is from openshift/cluster-version-operator#1035, and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --upstream is being used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition with a message mentioning the update service URI if there are problems. Although it doesn't look like we capture that condition currently:

$ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --upstream and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (#1954). The recent 1ae3a36 (#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current.

Fixes OTA-1211

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

[1] includes [2] pitching local update services in managed regions.
However, HyperShift had (until this commit) been forcing
ClusterVersion spec.upstream empty.  This commit is part of making the
upstream update service configurable on HyperShift, so those clusters
can hear about and assess known issues with updates in environments
where the default https://api.openshift.com/api/upgrades_info update
service is not accessible, or in which an alternative update service
was desired for testing or policy reasons. This includes [2],
mentioned above, but would also include any other instance of
disconnected/restricted-network use. The alternative for folks who
want HyperShift updates in places where api.openshift.com is
inaccessible are:

* Don't use an update service and manage that aspect manually.  But the
  update service declares multiple new releases each week, as well as
  delivering information about known risks/issues for local clusters
  to evalute their exposure.  That's a lot of information to manage
  manually, if folks decide not to plug into the existing update
  service tooling.
* Run a local update service, and fiddle with DNS and X.509 certs so
  that packets aimed at api.openshift.com get routed to your local
  service . This one requires less long-term effort than manually
  replacing the entire update service system, but it requires your
  clusters to trust an X.509 Certificate Authority that is willing to
  sign certificates for your local service saying "yup, that one is
  definitely api.openshift.com".

The implementation is similar to e438076
(api/v1beta1/hostedcluster_types: Add channel, availableUpdates, and
conditionalUpdates, 2022-12-14, openshift#1954), where I added update-related
status properties piped up via CVO -> ClusterVersion.status ->
HostedControlPlane.status -> HostedCluster.status.  This commit adds a
new spec property that is piped down via:

1. HostedCluster (management cluster) spec.updateService.
2. HostedControlPlane (management cluster) spec.updateService.
3. Cluster-version operator Deployment --update-service command line
   option (management cluster), and also ClusterVersion spec.upstream.
4. Cluster-version operator container (managment cluster).

The CVO's new --update-service option is from [3], and we're using
that pathway instead of ClusterVersion spec, because we don't want the
update service URI to come via the the customer-accessible hosted API.
It's ok for the channel (which is used as a query parameter in the
update-service GET requests) to continue to flow in via ClusterVersion
spec.  I'm still populating ClusterVersion's spec.upstream in this
commit for discoverability, although because --update-service is being
used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an
older cluster version, the old HostedControlPlane controller (launched
from the hosted release payload) will not recognize or propagate the
new property.  But that's not a terrible thing, and issues with
fetching update recommendations from the default update service will
still be reported in the RetrievedUpdates condition [4] with a message
mentioning the update service URI if there are problems [5].  Although
it doesn't look like we capture that condition currently:

  $ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
  api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur
frequently enough that dropping down to the ClusterVersion level to
debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig
access to both the management and hosted Kubernetes APIs, we could
drop --update-service and have the hosted CVO reach out and read this
configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (api/v1alpha1: Catch
up with channel, availableUpdates, and conditionalUpdates, 2023-01-17, openshift#1954).
The recent 1ae3a36 (Always include AWS default security group in
worker security groups, 2024-02-05, openshift#3527) suggests that the "v1alpha1
is a superset of v1beta1" policy remains current.

The cmd/install/assets, client/applyconfiguration,
docs/content/reference/api.md, hack/app-sre/saas_template.yaml, and
vendor changes are automatic via:

  $ make update

[1]: https://issues.redhat.com/browse/XCMSTRAT-513
[2]: https://issues.redhat.com/browse/OTA-1185
[3]: openshift/cluster-version-operator#1035
[4]: https://github.com/openshift/api/blob/54b3334bfac52883d515c11118ca191bffba5db7/config/v1/types_cluster_version.go#L702-L706
[5]: https://github.com/openshift/cluster-version-operator/blob/ce6169c7b9b0d44c2e41342e6414ed9db0a31a63/pkg/cvo/availableupdates.go#L356-L365
@wking wking force-pushed the upstream-update-service branch from 7d39a44 to de4bcbe Compare March 29, 2024 20:38
@wking wking changed the title OTA-1211: api/v1beta1/hostedcluster_types: Add spec.upstream OTA-1211: api/v1beta1/hostedcluster_types: Add spec.updateService Mar 29, 2024
@wking
Copy link
Member Author

wking commented Mar 29, 2024

openshift/cluster-version-operator#1035 merged and I've updated here with 7d39a44 -> de4bcbe to use the --update-service flag it settled on.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 29, 2024

@wking: This pull request references OTA-1211 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

XCMSTRAT-513 includes OTA-1185 pitching local update services in managed regions. However, HyperShift had (until this pull) been forcing ClusterVersion spec.upstream empty. This pull is part of making the upstream update service configurable on HyperShift, so those clusters can hear about and assess known issues with updates in environments where the default https://api.openshift.com/api/upgrades_info update service is not accessible, or in which an alternative update service was desired for testing or policy reasons. This includes OTA-1185, mentioned above, but would also include any other instance of disconnected/restricted-network use. The alternative for folks who want HyperShift updates in places where api.openshift.com is inaccessible are:

  • Don't use an update service and manage that aspect manually. But the update service declares multiple new releases each week, as well as delivering information about known risks/issues for local clusters to evalute their exposure. That's a lot of information to manage manually, if folks decide not to plug into the existing update service tooling.
  • Run a local update service, and fiddle with DNS and X.509 certs so that packets aimed at api.openshift.com get routed to your local service . This one requires less long-term effort than manually replacing the entire update service system, but it requires your clusters to trust an X.509 Certificate Authority that is willing to sign certificates for your local service saying "yup, that one is definitely api.openshift.com".

The implementation is similar to e438076 (#1954), where I added update-related status properties piped up via CVO -> ClusterVersion .status -> HostedControlPlane .status -> HostedCluster .status. This commit adds a new spec property that is piped down via:

  1. HostedCluster (management cluster) spec.updateService.
  2. HostedControlPlane (management cluster) spec.UpdateService.
  3. Cluster-version operator Deployment --update-service command line option (management cluster), and also ClusterVersion spec.upstream.
  4. Cluster-version operator container (management cluster).

The CVO's new --update-service option is from openshift/cluster-version-operator#1035, and we're using that pathway instead of ClusterVersion spec, because we don't want the update service URI to come via the the customer-accessible hosted API. It's ok for the channel (which is used as a query parameter in the update-service GET requests) to continue to flow in via ClusterVersion spec. I'm still populating ClusterVersion's spec.upstream in this commit for discoverability, although because --update-service is being used, the CVO will ignore the ClusterVersion spec.upstream value.

When the HyperShift operator sets this in HostedControlPlane for an older cluster version, the old HostedControlPlane controller (launched from the hosted release payload) will not recognize or propagate the new property. But that's not a terrible thing, and issues with fetching update recommendations from the default update service will still be reported in the RetrievedUpdates condition with a message mentioning the update service URI if there are problems. Although it doesn't look like we capture that condition currently:

$ grep -r 'ConditionType = "ClusterVersion' api/hypershift/v1beta1/
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionSucceeding ConditionType = "ClusterVersionSucceeding"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionUpgradeable ConditionType = "ClusterVersionUpgradeable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionFailing ConditionType = "ClusterVersionFailing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionProgressing ConditionType = "ClusterVersionProgressing"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionAvailable ConditionType = "ClusterVersionAvailable"
api/hypershift/v1beta1/hostedcluster_conditions.go:     ClusterVersionReleaseAccepted ConditionType = "ClusterVersionReleaseAccepted"

We could opt to collect it in follow-up work if issues occur frequently enough that dropping down to the ClusterVersion level to debug becomes tedious.

If, in the future, we grow an option to give the hosted CVO kubeconfig access to both the management and hosted Kubernetes APIs, we could drop --update-service and have the hosted CVO reach out and read this configuration off HostedControlPlane or HostedCluster directly.

I'm bumping v1alpha1 as described in 4af5ffa (#1954). The recent 1ae3a36 (#3527) suggests that the "v1alpha1 is a superset of v1beta1" policy remains current.

Fixes OTA-1211

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wking
Copy link
Member Author

wking commented Mar 29, 2024

I can launch clusters with this new HostedControlPlane controller via Cluster Bot and launch 4.16.0-0.ci,openshift/hypershift#3576 hypershift-hosted, but without building and launching my own HyperShift operator and updating the HostedCluster CRDs, I'm not sure how to test this pre-merge.

@wking
Copy link
Member Author

wking commented Mar 29, 2024

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2024

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibmcloud-roks de4bcbe link false /test e2e-ibmcloud-roks
ci/prow/e2e-ibmcloud-iks de4bcbe link false /test e2e-ibmcloud-iks

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@sjenning sjenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one nit

ControlPlaneReleaseImage *string `json:"controlPlaneReleaseImage,omitempty"`

// updateService may be used to specify the preferred upstream update service.
// By default it will use the appropriate update service for the cluster and region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the field is empty. Any defaulting would occur above this API in OCM/ACM/etc. This comment confused me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as a follow-on clean up s/By default it/By default, the CVO/g

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More future alternative wording options:

By default, lower-level components will use the appropriate update service for the cluster and region.

or:

When empty, lower-level components will use the appropriate update service for the cluster and region.

to avoid committing to the CVO as the lower-level actor that is filling in an opinion when HostedControlPlane has an empty-string here.

@sjenning
Copy link
Contributor

sjenning commented Apr 1, 2024

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 1, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 1, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 0adc4e6 into openshift:main Apr 1, 2024
@wking wking deleted the upstream-update-service branch April 1, 2024 23:24
wking added a commit to wking/hypershift that referenced this pull request Sep 17, 2024
To make it easier for folks working at the management-cluster level to
notice and address issues with update-recommendation retrieval.

Especially useful since de4bcbe (api/v1beta1/hostedcluster_types:
Add spec.updateService, 2024-03-29, openshift#3576) made updateService a
configurable knob, because:

* Users might configure a broken updateService, and the new condition
  would tell them that it wasn't working.
* Users might install a 4.15 or earlier HostedCluster, where the
  HostedControlPlane controller (which is extracted from the
  HostedCluster's release image) was too old to have de4bcbe, so it
  doesn't propagate the configured updateService, and the
  cluster-version operator ends up using the default upstream update
  service.  That doesn't work in disconnected/restricted-network
  environments where api.openshift.com is unreachable.  The new
  condition would tell them that it wasn't working, and that the CVO
  was using the api.openshift.com service.  They'd be left on their
  own to realize that the updateService feature required a 4.16 or
  later HostedCluster release image.
wking added a commit to wking/hypershift that referenced this pull request Sep 17, 2024
To make it easier for folks working at the management-cluster level to
notice and address issues with update-recommendation retrieval.

Especially useful since de4bcbe (api/v1beta1/hostedcluster_types:
Add spec.updateService, 2024-03-29, openshift#3576) made updateService a
configurable knob, because:

* Users might configure a broken updateService, and the new condition
  would tell them that it wasn't working.
* Users might install a 4.15 or earlier HostedCluster, where the
  HostedControlPlane controller (which is extracted from the
  HostedCluster's release image) was too old to have de4bcbe, so it
  doesn't propagate the configured updateService, and the
  cluster-version operator ends up using the default upstream update
  service.  That doesn't work in disconnected/restricted-network
  environments where api.openshift.com is unreachable.  The new
  condition would tell them that it wasn't working, and that the CVO
  was using the api.openshift.com service.  They'd be left on their
  own to realize that the updateService feature required a 4.16 or
  later HostedCluster release image.
bryan-cox pushed a commit to bryan-cox/hypershift that referenced this pull request Sep 25, 2024
To make it easier for folks working at the management-cluster level to
notice and address issues with update-recommendation retrieval.

Especially useful since de4bcbe (api/v1beta1/hostedcluster_types:
Add spec.updateService, 2024-03-29, openshift#3576) made updateService a
configurable knob, because:

* Users might configure a broken updateService, and the new condition
  would tell them that it wasn't working.
* Users might install a 4.15 or earlier HostedCluster, where the
  HostedControlPlane controller (which is extracted from the
  HostedCluster's release image) was too old to have de4bcbe, so it
  doesn't propagate the configured updateService, and the
  cluster-version operator ends up using the default upstream update
  service.  That doesn't work in disconnected/restricted-network
  environments where api.openshift.com is unreachable.  The new
  condition would tell them that it wasn't working, and that the CVO
  was using the api.openshift.com service.  They'd be left on their
  own to realize that the updateService feature required a 4.16 or
  later HostedCluster release image.
pamelachristie pushed a commit to pamelachristie/hypershift that referenced this pull request Mar 31, 2025
To make it easier for folks working at the management-cluster level to
notice and address issues with update-recommendation retrieval.

Especially useful since de4bcbe (api/v1beta1/hostedcluster_types:
Add spec.updateService, 2024-03-29, openshift#3576) made updateService a
configurable knob, because:

* Users might configure a broken updateService, and the new condition
  would tell them that it wasn't working.
* Users might install a 4.15 or earlier HostedCluster, where the
  HostedControlPlane controller (which is extracted from the
  HostedCluster's release image) was too old to have de4bcbe, so it
  doesn't propagate the configured updateService, and the
  cluster-version operator ends up using the default upstream update
  service.  That doesn't work in disconnected/restricted-network
  environments where api.openshift.com is unreachable.  The new
  condition would tell them that it wasn't working, and that the CVO
  was using the api.openshift.com service.  They'd be left on their
  own to realize that the updateService feature required a 4.16 or
  later HostedCluster release image.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants